This assignment is for ETC5521 Assignment 1 by Team taipan comprising of Helen Evangelina and Yiwen Jiang.

1 Introduction and motivation

This report presents the findings of the woodland caribou between 1988 to 2016 following the tracking data conducted under B.C. Ministry of Environment & Climate Change. This report mainly analyses the changes in the number of woodland caribou, and other analyses include the habitats changes caused by seasonal differences, the effects of the implementation of management plans and the causes of tag deployment ended. In the following section, we will describe the data set, where the data came from, and what is the data prepared for. The data description also includes how we transform and clean the raw data for analysis. Our statistical programming used for analysis is R and Rstudio.

1.1 Motivation

Caribou are the only large herbivore that is widely distributed in the high-elevation habitat and act as agents for plant and lichen diversity through the mechanisms of trampling and foraging. The Caribou has also been a significant resource for indigenous peoples for millennia. (BC Ministry of Environment (2014)). The survival rate of the Caribou is generally relatively low due to predation by Canis Lupus (wolf). The Caribou listed as “vulnerable” on the International Union for Conservation of Nature (IUCN) Red List. With the Caribou being listed as “Threatened”, it is essential to monitor the number of the Caribou as monitoring is vital to effective conservation. We will represent our findings in this report through the exploration of the Caribou tracking data.

1.2 Data Sources

The tracking data was collected by B.C. Ministry of Environment & Climate Change over 28 years (1988 - 2016), the data was prepared for the study of management and recovery of the caribou. It includes the information of 286 Caribou and covered 250,000 locations.

1.3 Data Limitation

Visualise the missing value in individuals data

Figure 1.1: Visualise the missing value in individuals data

  • There are several limitations associated with this dataset. The most noticeable thing about this dataset is that there are a lot of NAs which causes limited analysis. We observe that in the individuals data over half of the values are missing (Refer to Figure 1.1). Not many analysis can be performed because by removing the NAs, there would not be sufficient data left to be analysed and insufficient data would lead to inaccurate result. For example, in the pregnant variable, there are 93.36% of the values missing.

  • The Caribou has a low reproductive rate due to females only have one calf per year, and females do not reproduce until they are two years old. To analysis, the sex ratio should be a good indicator of the trend of the number of Caribou. However, there are only five males Caribou out of 286. The analysis result will exist bias when we use the sex ratio as an indicator.

  • Another limitation is deploy_off_type mainly consists of “unknown”, indicating improper records. This would not lead to accurate analysis. We wanted to see if the equipment failure on the deploy_off_type is many to see if the equipment are working properly or not - and if not, then the quality of the equipment should be improved. However, it turned out that there isn’t any equipment failure value in the deploy_off_type. Same thing with the death_cause which consists many “unknown” values. Other than that, there is inconsistent naming of the values inside the dead_cause variable.

2 Data description

2.1 Overview of the dataset

The dataset tracks woodland caribou in northern British Columbia published by the Movebank Data Repository at https://www.datarepository.movebank.org/handle/10255/move.955. This data was collected by putting trackers of almost 250,000 location tags on 260 caribou, from 1988 to 2016, which was accessed through Movebank.

The boreal woodland caribou (Rangifer tarandus caribou), also known as woodland caribou, boreal forest caribou and forest-dwelling caribou, is a North American subspecies of the reindeer (or the caribou in North America) with the vast majority of animals in Canada. They prefer lichen-rich mature forests and mainly live in marshes, bogs, lakes and river regions. Caribou are considered as an ancient member of the deer family Cervidae (Banfield, 1974). They are smaller than Moose (Alces americanus) and Elk (Cervus canadensis), standing 1.0–1.2 m high at the shoulder (Thomas and Gray, 2002). Due to the caribou is classified as “Vulnerable” on the International Union for the Conservation of Nature’s (IUCN) Red List. The data provided for the study of the B.C. Ministry of Environment & Climate Change to report the management and recovery of the caribou.

Because this data set is used for analysing the reproduction of species, the data is obtained by observation rather than experiment. There is no treatment group and the control group. The time frame of collection was started from 1988 and end of 2016. Movebank captures the locations of individual animals over time by tracking the bio-logging sensors attached to animals (Kranstauber et al., 2011). The data sets were separated into two data files and provided by .csv format. The following are the variables in each data.

  • The individual data comes from Mountain caribou in British Columbia-reference-data.csv. The data contains the relevant information of 286 caribou. The variables are showing in the following table:
Variable Class Description
animal_id character Individual identifier for animal
sex character Sex of animal
life_stage character Age class (in years) at beginning of deployment
pregnant logical Whether animal was pregnant at beginning of deployment
with_calf logical Whether animal had a calf at time of deployment
death_cause character Cause of death
study_site character Deployment site or colony, or a location-related group such as the herd or pack name
deploy_on_longitude double Longitude where animal was released at beginning of deployment
deploy_on_latitude double Latitude where animal was released at beginning of deployment
deploy_on_comments character Additional information about tag deployment
deploy_off_longitude double Longitude where deployment ended
deploy_off_latitude double Latitude where deployment ended
deploy_off_type character Classification of tag deployment end (see table below for full description
deploy_off_comments character Additional information about tag deployment end
  • The location comes from Mountain caribou in British Columbia-gps.csv. The data contains location information of each counted caribous for every 4 fours.
Variable Class Description
event_id double Identifier for an individual measurement
animal_id character Individual identifier for animal
study_site character Deployment site or colony, or a location-related group such as the herd or pack name
season character Season (Summer/Winter) at time of measurement
timestamp datetime Date and time of measurement
longitude double Longitude of measurement
latitude double Latitude of measurement

2.2 Data cleaning processes

The data being used is the dataset from the Science update for the South Peace Northern Caribou (Rangifer tarandus caribou pop. 15) in British Columbia available from Movebank (BC Ministry of Environment, 2014). The raw datasets are first read by using read_csv() function. It can be noticed from the raw datasets that the variable names use “-“ instead of “_”. Using dash in a variable name might result to issues, as the valid variable name in R should consist of dot or underline characters. Another problem from this dataset is the values in the “animal-life-stage” consist of spacing, which might lead to issues as it is inconsistent. Another noticeable thing is the datasets have a lot of NA values. Therefore, the data needs to be cleaned by using the tidyverse and janitor libraries.

To clean the individuals data, firstly clean_names() function from the janitor package is used to return the data.frame with clean names. What this function does is changing the variable names into a tidier form. As mentioned before, using dash in variable names is not appropriate in R. Notice that the raw dataset has names like “deploy-off-latitude” which is changed into “deploy_off_latitude”. Next is to assigned the result to transmute(), which will compute new columns but will drop existing columns. This is done to make the variable names in a tidier way. The whitespace in the life stage is gotten rid to address inconsistent spacing by using str_remove_all() function. After tidying the variable names with transmute, the “reproductive_condition” variable is separated into “pregnant” and “with_calf” by using the separate() function as this variable actually contains two dimensions, and then assigning those variables into new columns by using the mutate() function which consists of either TRUE or FALSE value.

The locations data is cleaned by using the same method as the individuals data, which includes cleaning the name first by using clean_names() function to arrive to a data.frame with clean names. The next step is to use transmute() function to compute new columns with dropping existing columns. After cleaning both datasets, the final datasets are written into csv format by using write_csv() function.

# Load libraries
library(tidyverse)
library(janitor)

# Import data
individuals_raw <- read_csv("./caribou-location-tracking/raw/Mountain caribou in British Columbia-reference-data.csv")
locations_raw <- read_csv("./caribou-location-tracking/raw/Mountain caribou in British Columbia-gps.csv")

# Clean individuals
individuals <- individuals_raw %>%
  clean_names() %>%
  transmute(animal_id,
            sex = animal_sex,
            # Getting rid of whitespace to address inconsistent spacing
            # NOTE: life stage is as of the beginning of deployment
            life_stage = str_remove_all(animal_life_stage, " "),
            reproductive_condition = animal_reproductive_condition,
            # Cause of death "cod" is embedded in a comment field
            death_cause = str_remove(animal_death_comments, ".*cod "),
            study_site,
            deploy_on_longitude,
            deploy_on_latitude,
            # Renaming to maintain consistency "deploy_on_FIELD" and "deploy_off_FIELD"
            deploy_on_comments = deployment_comments,
            deploy_off_longitude,
            deploy_off_latitude,
            deploy_off_type = deployment_end_type,
            deploy_off_comments = deployment_end_comments) %>%
  # reproductive_condition actually has two dimensions
  separate(reproductive_condition, into = c("pregnant", "with_calf"), sep = ";", fill = "left") %>%
  mutate(pregnant = str_remove(pregnant, "pregnant: ?"),
         with_calf = str_remove(with_calf, "with calf: ?")) %>%
  # TRUE and FALSE are indicated by Yes/No or Y/N
  mutate_at(vars(pregnant:with_calf), ~ case_when(str_detect(., "Y") ~ TRUE,
                                                   str_detect(., "N") ~ FALSE,
                                                   TRUE ~ NA))

# Clean locations
locations <- locations_raw %>%
  clean_names() %>%
  transmute(event_id,
            animal_id = individual_local_identifier,
            study_site = comments,
            season = study_specific_measurement,
            timestamp,
            longitude = location_long,
            latitude = location_lat)

# Write to CSV
write_csv(individuals, "./caribou-location-tracking/individuals.csv")
write_csv(locations, "./caribou-location-tracking/locations.csv")

2.3 Possible questions

This dataset is primarily used to analyse the changes in the number of caribou from 1988 to 2016 to observe the survival of the species. As the management came up with a plan, we would like to analyse whether the management plan is effective in increasing the number of caribou over time.

The primary question to answer from this dataset is how is the trend of the number of caribou over time?

From the primary question, we came up with four secondary questions, which are as follows: - Does the habitats vary between summer and winter?
- How is the trend of the classification of tag deployment end (deploy_off_type)?
- Has the management plan increased the number of caribou?

3 Analysis and findings

caribou_trend <- locations %>% 
  separate(timestamp, c("date", "time"), sep = " ") %>% 
  mutate(month = month(date), year = year(date)) %>%
  group_by(animal_id, study_site, month, year) %>% 
  summarise(n = 1) %>% 
  group_by(month, year) %>% 
  summarise(count = sum(n)) %>%
  mutate(date = as.Date(paste(year, as.numeric(month), "01",  sep="-"), 
                   format = "%Y-%m-%d"))
## `summarise()` regrouping output by 'animal_id', 'study_site', 'month' (override with `.groups` argument)
## `summarise()` regrouping output by 'month' (override with `.groups` argument)
trend_plot <- ggplot(caribou_trend, aes(x = date, y = count)) +
  geom_line() +
  xlab("") +
  ylab("Number of Caribou been tracked") +
  theme_bw()

ggplotly(trend_plot)

Figure 3.1: Monthly number of Caribou been tracked between 1988 to 2016

[FILL] Should include at least one plot or numerical summary for each of your questions, that helps the reader arrive at an answer. You should also write paragraphs describing the methods, summaries and findings.

3.1 Question 1: What can we learn about the sex ratio and population change of the caribou?

# get map data
caribou_map <- get_map(location = c(-125, 52.5, -119, 57.6), source = "osm") 
## Source : http://tile.stamen.com/terrain/7/19/38.png
## Source : http://tile.stamen.com/terrain/7/20/38.png
## Source : http://tile.stamen.com/terrain/7/21/38.png
## Source : http://tile.stamen.com/terrain/7/19/39.png
## Source : http://tile.stamen.com/terrain/7/20/39.png
## Source : http://tile.stamen.com/terrain/7/21/39.png
## Source : http://tile.stamen.com/terrain/7/19/40.png
## Source : http://tile.stamen.com/terrain/7/20/40.png
## Source : http://tile.stamen.com/terrain/7/21/40.png
## Source : http://tile.stamen.com/terrain/7/19/41.png
## Source : http://tile.stamen.com/terrain/7/20/41.png
## Source : http://tile.stamen.com/terrain/7/21/41.png
ggmap(caribou_map) +
  geom_point(data = locations,
             aes(x = longitude, y = latitude, color = season),
              alpha = 0.5, size = 0.5) +
  theme_void() +
  theme(legend.position = "none",
        panel.grid = element_blank(),
        axis.title = element_blank(),
        axis.text = element_blank(),
        axis.ticks = element_blank())

ggmap(caribou_map) +
  geom_point(data = locations, 
             aes(longitude, latitude, col = study_site), size = 0.3, alpha = 0.9) +
  gghighlight(unhighlighted_params = list(colour = "#F2EFC7"), use_direct_label = FALSE) +
  palettetown::scale_colour_poke(pokemon = "golbat") +
  guides(colour = guide_legend(title = "Herd", override.aes = list(size = 4))) +
  facet_wrap(~season, strip.position = "bottom") +
  labs(title = "Seasonal differences of habitats (Coloured by herds)") +
  theme_void() 

ggmap(caribou_map) +
  geom_point(data = locations,
             aes(x = longitude, y = latitude, color = season),
              alpha = 0.1, size = 0.5) +
  theme(legend.position = "none",
        panel.grid = element_blank(),
        axis.title = element_blank(),
        axis.text = element_blank(),
        axis.ticks = element_blank()) +
  facet_wrap(study_site~season, ncol = 4)

3.2 Question 3: How is the trend of the classification of tag deployment end (deploy_off_type)? What is the highest cause of dead?

In order to answer this question, we analyse the deploy_off_type variable by visualising it on bar charts.

deploy_type <- individuals %>%
  group_by(deploy_off_type) %>%
count(deploy_off_type) %>%
  ggplot(aes(x = reorder(deploy_off_type, -n),
             y = n)) +
  geom_col(aes(fill = deploy_off_type)) +
  ggtitle("Classification of tag deployment end") +
  xlab("Tag deployment end type") +
  ylab("Total number")  +
  theme(axis.text.y = element_text(size = 12, color = "black", hjust = 1),
        plot.title = element_text(size = 14, face = "bold", hjust = 0.5),
        plot.title.position = "plot")

ggplotly(deploy_type)

Figure 3.2: Distribution of the classification of tag deployment end.

It can be seen from figure 3.2 that out of all the deploy_off_type stated in the data description, there are only four types available in the dataset, which are dead, removal, other and unknown. The highest number of type, with 99, is unknown, which indicates that there was no proper tracking and documentation. The second highest type is removal, which means that the tag was purposefully removed from the animal. Following removal is dead with 60, which is still a lot. The least number of type is other, which might consist of other deployment end types.

As there are several data with dead as the deployment end type, we are looking at the cause of death of caribou in more details. It is important to look at the cause of death as understanding the cause of death of the caribou would allow further monitoring of the caribou to minimise their death. To analyse the cause of death, the individuals dataset is wrangled by filtering the deploy_off_type. As there are some NAs in the death_cause column, the NAs are removed.

And then a bar plot is created and reordered by the number of the death_cause occuring in the data.

plot_death <- individuals_dead %>%
  group_by(death_cause) %>%
  count(death_cause) %>%
  ggplot(aes(x = reorder(death_cause, n), y = n, fill = death_cause)) +
  geom_col() +
  coord_flip() +
  theme(legend.position = "none") +
  ggtitle("Distribution of death per cause of death") +
  xlab("Cause of death") +
  ylab("Total number of death") +
  theme(plot.title = element_text(size = 14, face = "bold", hjust = 0.5),
        plot.title.position = "plot")

ggplotly(plot_death)

Figure 3.3: Total number of death caused by each cause of death.

It can be seen from figure 3.3 that the highest number of cause of death is predation by wolf, which is 18 deaths, if the suspected predation ones are included. The other substantial part of the cause of death includes predation by bear or predation by unknown animals, which concludes that the majority of the death of the caribou was due to predation with around 30% of the total death. The other causes such as vehicle collision, train collision, and accidentals are only partaking a very small part of the death causes. However, there is also a significant amount of unknown which indicates that monitoring and documentation is not really good.

What we can learn from the analysis is that the majority cause of calf mortality is predation, primarily by wolf, and it is important to take this into consideration when the management are developing plans to increase the number of caribou. The seasonal habitat change that is compromising spatial separation between caribou and the predators can increase the risk of predation. Grey wolves normally live in low elevations, however they occasionally hunt at higher elevations where caribou live (https://www2.gov.bc.ca/assets/gov/environment/plants-animals-and-ecosystems/wildlife-wildlife-habitat/caribou/science_update_final_from_web_jan_2014.pdf). Another reason for the increasing encounter rate of caribou by wolves is the development of linear features such as pipelines and roads which facilitates rapid and widespread movements of wolves (McKenzie et al., 2012).

3.3 Question 4: Has the management plan increased the number of caribou?

According to the Peace Northern Caribou Plan endorsed in November 2012, the goal is to increase the South Peace Northern Caribou to 1,200 animals within 21 years, which has been articulated in implementation plans. Therefore, we are analysing whether the management plan has increased the number of caribou by looking at the number of caribou in the dataset.

trendnumber <- numbercaribou %>%
  ggplot(aes(x = year,
             y = total)) +
  geom_line() +
  ggtitle("Trend of number of caribou yearly") +
  xlab("Year") +
  ylab("Total number of caribou")  +
  theme(axis.text.y = element_text(size = 12, color = "black", hjust = 1),
        plot.title = element_text(size = 14, face = "bold", hjust= "center"),
        plot.title.position = "plot")

ggplotly(trendnumber)

Figure 3.4: The trend of the number of caribou per year.

Figure 3.4 presents the trend of the number of caribou yearly according to the locations dataset. It can be noticed that the number of caribou fluctuated overtime. It declined by a significant amount in 2001, however the number increased sharply again in 2002 and 2003, and declined again in 2006. The peak was at 2008, where the number of caribou reached 70, which is more than double the number of caribou in 2006. After 2008, the number decreased again. The management plan which was started to be implemented in 2012 seems to be ineffective as the number of caribou still ranges at around 20-30 caribou.

Next, we are looking at the number of caribou’s trend yearly in more details by looking at the trend per study_site.

study_site_trend <- joined_summary %>%
  ggplot(aes(x = year,
             y = number,
             group = study_site,
             colour = study_site)) +
  geom_line() +
  ggtitle("Trend of number of caribou yearly by study site") +
  xlab("Year") +
  ylab("Total number of caribou") +
  theme(axis.text.y = element_text(size = 12, color = "black", hjust = 1),
        plot.title = element_text(size = 14, face = "bold"),
        plot.title.position = "plot")

ggplotly(study_site_trend)
## Warning: `group_by_()` is deprecated as of dplyr 0.7.0.
## Please use `group_by()` instead.
## See vignette('programming') for more help
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.

Figure 3.5: Trend of number of caribou per year according to the study site.

Figure 3.5 above is showing the changes in the number of caribou yearly based on the study site. The most noticeable thing here is that not all study sites are being recorded every year, and that the trend of number of caribou varies over the study site. From 1988 to 2001, the only study site being recorded was Hart Ranges, in where there was a huge drop between this period. Most of the other study sites started to be recorded since 2002, except for Scott which started in 2013. Most of the study sites are showing a sharp decline in the number of caribou in 2010, which explains the huge drop in the overall trend mentioned before. The number of caribou in Narraway is super low which is due to the herds that are living in this site tend to stay in the lower-elevation area during summer, thus are more exposed to predation risk. Looking at the number of caribou from 2012, there has not been any significant changes in the number of caribou. Therefore, it appears that the management plan has not increased the number of caribou.

So to answer whether the management plan has increased the number of caribou, it appears that the management plan has not increased the number of caribou. And it is unlikely that the number of caribou would increase by that much in 2021. However, this analysis might not be accurate as there are some limitations associated with this analysis. Firstly, some study sites were only being monitored for certain years, which results to inaccurate result to analyse the trend of number of caribou. Secondly, there might be some caribou that were not tracked in certain years as we are calculating the number of caribou per year by using the documented locations data. It might not be an accurate analysis as there might be some caribou that were not tracked in certain years, or that the locations were not being documented inside the locations dataset.

4 References